Relode the page if the visualizations don't show up¶

Exploratory Data Analysis of Crop Yields and Applications of Pesticides¶

-By Fahmid Hasan (B.Sc.Ag(Hons), Bangaldesh Agricutural University, Mymensingh)¶

Importing all necessary libraries¶

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly.io as pio
from scipy.stats import pearsonr
pio.renderers.default = 'notebook'  

Loading the Data¶

In [2]:
pesticides=pd.read_csv('pesticides.csv')
yield_df=pd.read_csv('yield_df.csv')

A quick look at the data sets¶

This data set contains 4349 entries with 7 columns¶

In [3]:
pesticides.sample(10)
Out[3]:
Domain Area Element Item Year Unit Value
3777 Pesticides Use Tajikistan Use Pesticides (total) 2010 tonnes of active ingredients 262.10
2490 Pesticides Use Mali Use Pesticides (total) 2006 tonnes of active ingredients 11.87
3668 Pesticides Use Suriname Use Pesticides (total) 2007 tonnes of active ingredients 1847.86
1872 Pesticides Use Iran (Islamic Republic of) Use Pesticides (total) 1991 tonnes of active ingredients 16349.00
957 Pesticides Use Colombia Use Pesticides (total) 2008 tonnes of active ingredients 48538.97
1483 Pesticides Use French Polynesia Use Pesticides (total) 2007 tonnes of active ingredients 48.64
1606 Pesticides Use Guatemala Use Pesticides (total) 1995 tonnes of active ingredients 12004.33
886 Pesticides Use China, mainland Use Pesticides (total) 1991 tonnes of active ingredients 765307.00
4284 Pesticides Use Yemen Use Pesticides (total) 2008 tonnes of active ingredients 79.57
1681 Pesticides Use Guinea-Bissau Use Pesticides (total) 2016 tonnes of active ingredients 82.00
In [4]:
pesticides.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4349 entries, 0 to 4348
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Domain   4349 non-null   object 
 1   Area     4349 non-null   object 
 2   Element  4349 non-null   object 
 3   Item     4349 non-null   object 
 4   Year     4349 non-null   int64  
 5   Unit     4349 non-null   object 
 6   Value    4349 non-null   float64
dtypes: float64(1), int64(1), object(5)
memory usage: 238.0+ KB

This Data set contains 28242 entries with 7 columns¶

In [5]:
yield_df.sample(10)
Out[5]:
Unnamed: 0 Area Item Year hg/ha_yield average_rain_fall_mm_per_year pesticides_tonnes avg_temp
19535 19535 Mexico Sweet potatoes 1998 187500 758.0 34468.93 17.07
881 881 Australia Sweet potatoes 1991 151142 534.0 17866.00 20.29
18517 18517 Mali Potatoes 1991 175000 282.0 54.63 27.58
5632 5632 Canada Potatoes 1992 118173 537.0 29387.00 6.25
14522 14522 India Sweet potatoes 2013 101288 1083.0 45620.00 25.85
26821 26821 Turkey Wheat 2001 20765 593.0 25539.00 12.32
20936 20936 Mozambique Rice, paddy 2011 11366 1032.0 766.93 21.69
21205 21205 Netherlands Maize 2009 129954 778.0 9857.20 10.33
11898 11898 India Wheat 1997 26789 1083.0 52279.00 24.83
3340 3340 Brazil Cassava 1998 123522 1761.0 118930.56 27.54
In [6]:
yield_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28242 entries, 0 to 28241
Data columns (total 8 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Unnamed: 0                     28242 non-null  int64  
 1   Area                           28242 non-null  object 
 2   Item                           28242 non-null  object 
 3   Year                           28242 non-null  int64  
 4   hg/ha_yield                    28242 non-null  int64  
 5   average_rain_fall_mm_per_year  28242 non-null  float64
 6   pesticides_tonnes              28242 non-null  float64
 7   avg_temp                       28242 non-null  float64
dtypes: float64(3), int64(3), object(2)
memory usage: 1.7+ MB

Exploratory Data Analysis and Visualization (Interactive Charts) using Plotly¶

Topics:¶

  1. Top ten Countries used the highest amount of pesticides in the year of 2016

  2. Pesticide usage in the number one country.

  3. Application of pesticides in Bangladesh from 1990 to 2016.

  4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)

  5. Pesticide application and the Yields of some dominant crops in Bangladesh.

  6. No.4 Analysis of Rice (The main dominating crop in Bangladesh)

  7. Uneven distribution of Yields and Pesticide application in some past Years

  8. Some Basic Analysis.

1. Top ten Countries used the highest amount of pesticides in the year of 2016¶

In [7]:
ten_pesticides_country=pesticides[pesticides.Year==2016][['Area','Value']].sort_values(by='Value',ascending=False).head(10)
In [8]:
	
fig=px.bar(ten_pesticides_country,
           y='Area',
           x='Value',
           title='Top ten countries which used the highest amount of pesticides in 2016',
           color=ten_pesticides_country.Area)
fig.update_layout(
    yaxis_title='Country',
    xaxis_title='Tonnes of active ingredients',
    width=1200,
    height=700
)
fig.show()

China(Main land) used the highest amount of Pesticides in the year of 2016¶

2. Pesticides Application in China (Different regions) from 1990 to 2016¶

In [9]:
pesticides_china=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China, mainland', 'China, Taiwan Province of'])][['Area','Year','Value']]
In [10]:
fig=px.bar(
    pesticides_china,
    x='Year',
    y='Value',
    color='Area',
    title='Pesticides Usage in China from 1990 to 2016',
)
fig.update_layout(
    yaxis_title='Tonnes of active ingredients',
    width=1200,
    height=700
)
fig.show()

Mainland of China used the highest amount of Pesticides compared to other regions¶

3. Pesticides Usage over the past Years in BD (from 1990 to 2013.)¶

In [11]:
yield_df=yield_df.drop(columns='Unnamed: 0',inplace=False,axis=1)
yield_df['hg/ha_yield']=yield_df['hg/ha_yield']/10000
yield_df=yield_df.rename(columns={'hg/ha_yield':'ton/ha_yield'})
yield_bd=yield_df[yield_df.Area=='Bangladesh']
In [12]:
yield_pest_bd=yield_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [13]:
fig=px.bar(
    yield_pest_bd,
    x=yield_pest_bd.index,
    y='pesticides_tonnes',
    color='ton/ha_yield',
    title='Pesticides Usage over the past Years in BD (from 1990 to 2013.'
)

fig.update_layout(yaxis_title='Pesticide Dose (Tonnes)',
                  width=1200,
                  height=700)
fig.show()

The application of pesticides has increased exponentially over the past years¶

4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)¶

In [14]:
corr_df=yield_df.corr(numeric_only=True)
corr_df
Out[14]:
Year ton/ha_yield average_rain_fall_mm_per_year pesticides_tonnes avg_temp
Year 1.000000 0.091630 -0.003798 0.140930 0.014409
ton/ha_yield 0.091630 1.000000 0.000962 0.064085 -0.114777
average_rain_fall_mm_per_year -0.003798 0.000962 1.000000 0.180984 0.313040
pesticides_tonnes 0.140930 0.064085 0.180984 1.000000 0.030946
avg_temp 0.014409 -0.114777 0.313040 0.030946 1.000000

Correlation Heatmap¶

In [15]:
plt.figure(figsize=(12,6))
sns.heatmap(
    data=corr_df,
    cmap='Greys'
)
plt.title('Correlations of the Numerical variables')
Out[15]:
Text(0.5, 1.0, 'Correlations of the Numerical variables')
No description has been provided for this image

3d scatter plot¶

In [57]:
fig=px.scatter_3d(yield_df.sample(2000),
               x='pesticides_tonnes',
               y='Year',
               z='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item'],
               hover_name='Area',
               title='A Scatterplot of Crops Yield vs Pesticide usage from (1990 to 2013) '
               )
fig.update_layout(
    scene=dict(
    yaxis_title='Year',
    xaxis_title='Pesticides Dose (Tonnes)',
    zaxis_title='Yield (tonne/Ha)',),
    width=1200,
    height=700
)
fig.show()

Potato has the highest Tonne/Ha yield.¶

Pesticides usage is very high in Brazil compared to other countries¶

2d scatter plot¶

In [52]:
fig=px.scatter(yield_df.sample(2000),
               x='pesticides_tonnes',
               y='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item'],
               hover_name='Area',
               title='Crops Yield vs Pesticide usage from (1990 to 2013'
               )
fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()
In [18]:
correlation_AVG= yield_df[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_AVG)
                   ton/ha_yield  pesticides_tonnes
ton/ha_yield           1.000000           0.064085
pesticides_tonnes      0.064085           1.000000

Some countries use less amount of pesticides and get higher Crop yields and vice versa. Brazil uses a significant amount of pesticides, but tonne/ha yield is not that much compared to other countries like Switzerland, Germany, the United Kingdom, etc.¶

Here the correlation coefficient between ton/ha_yield and pesticides_tonnes is approximately 0.0641. Which is very close to 0, indicating a very weak positive linear relationship between pesticide usage and crop yield.¶

4.1 Correlation Analysis In case of Bangladesh¶

3d Scatterplot¶

In [53]:
fig=px.scatter_3d(yield_bd,
               x='pesticides_tonnes',
               y='Year',
               z='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item'],
               hover_name='Area',
               title='Crops Yield vs Pesticide usage in BD (1990 to 2013)'
               )
fig.update_layout(
    scene=dict(
    yaxis_title='Year',
    xaxis_title='Pesticides Dose (Tonnes)',
    zaxis_title='Yield (tonne/Ha)',),
    width=1200,
    height=700
)
fig.show()

2d Scatterplot¶

In [20]:
fig=px.scatter(yield_bd,
               x='pesticides_tonnes',
               y='ton/ha_yield',
               size='pesticides_tonnes',
               color='pesticides_tonnes',
               hover_data=['Item'],
               hover_name='Area',
               title='Crops Yield vs Pesticide usage in BD (1990 to 2013')

fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700)
fig.show()
In [21]:
correlation_BD= yield_bd[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_BD)
                   ton/ha_yield  pesticides_tonnes
ton/ha_yield            1.00000            0.14386
pesticides_tonnes       0.14386            1.00000

Same goes here, the correlation coefficient is 0.14386 Which is close to 0, indicating a weak positive linear relationship between pesticide usage and crop yield.¶

5. Pesticide application and the Yields of some dominant crops in Bangladesh.¶

Line Chart (3d)¶

In [22]:
fig=px.line_3d(yield_bd,
            x='pesticides_tonnes',
            z=yield_bd['ton/ha_yield'],
            y='Year',
            hover_data=['Year'],
            color='Item',
            title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')

fig.update_layout(
    width=1200,
    height=700
)
fig.show()

Line Chart(2d)¶

In [23]:
fig=px.line(yield_bd,
            x='pesticides_tonnes',
            y=yield_bd['ton/ha_yield'],
            hover_data=['Year'],
            color='Item',
            title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')

fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()

Potatoes tonne/ha yield is very high compared to other major crops. There was some sort of uneven distribution of pesticides and the yield of crops from 2008 to 2012¶

6. No.5 Analysis for Rice (The main dominating crop in Bangladesh)¶

In [24]:
yield_bd_rice=yield_bd[yield_bd.Item=='Rice, paddy']
In [25]:
fig=px.line(yield_bd_rice,
            x='pesticides_tonnes',
            y='ton/ha_yield',
            title='Effects of pesticides usage on Rice yield in Bangladesh from 1990 to 2013',
            hover_data=['Year']
            )
fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()

7. Uneven distribution of Crop Yields and Pesticide application in some of the past Years (2008 to 20012)¶

In [26]:
yield_8_9_10_11_12=yield_df[yield_df.Year.isin(range(2008,2013))]
yield_8_9_10_11_12_bd=yield_8_9_10_11_12[yield_8_9_10_11_12.Area=='Bangladesh']
yield_pests_8to12=yield_8_9_10_11_12_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [27]:
fig=px.bar(yield_pests_8to12,y='pesticides_tonnes',
            x=yield_pests_8to12.index,
            title='Pesticides usage and yield of crops from 2008 to 20012',
            color='ton/ha_yield'
            )
fig.update_layout(
    yaxis_title='Pesticides Dose (Tonnes)',
    xaxis_title='Year',
    width=1200,
    height=700
)
fig.show()

Both the application of pesticide and yield of crops were unevenly distributed. There could be some problem with the Data entry in those years¶

7.1 However, the distribution ascended in some of the previous years (2000 to 2007)¶

In [28]:
yield_2000to2007=yield_df[yield_df.Year.isin([2000,2001,2002,2004,2005,2006,2007])]
yield_2000to2007bd=yield_2000to2007[yield_2000to2007.Area=='Bangladesh']
yield_2000to2007bd=yield_2000to2007bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [29]:
fig=px.bar(yield_2000to2007bd,y='pesticides_tonnes',
            x=yield_2000to2007bd.index,
            title='Pesticides usage from 2000 to 2007 in BD',
            color='ton/ha_yield'
            )
fig.update_layout(
    yaxis_title='Pesticides dose (Tonnes)',
    xaxis_title='Year',
    width=1200,
    height=700
    )

8. Some Basic analysis¶

Top ten Rice Producing Countries in 2013¶

In [30]:
rice_df=yield_df[yield_df.Item=='Rice, paddy']
rice_df=rice_df[rice_df.Year==2013]
rice_df
top_ten_rice_df_13=rice_df.groupby('Area')['ton/ha_yield'].sum()
top_ten_rice_df_13=top_ten_rice_df_13.sort_values(ascending=False)
top_ten_rice_df_13=top_ten_rice_df_13.head(10)
In [31]:
fig=px.bar(
    top_ten_rice_df_13,
    x=top_ten_rice_df_13.index,
    y=top_ten_rice_df_13,
    title='Top ten Rice producing countries in 2013',
    color=top_ten_rice_df_13.index
)
fig.update_layout(
    xaxis_title='Country',
    yaxis_title='Yield (Tonne/Ha)',
    width=1200,
    height=700
)
fig.show()

Ten highest crops (Maize, Potatoes, Rice, Sorghum, Soybeans, Wheat, Cassava, Sweet potatoes, Plantains and others, Yams) yielding countries from 1990 to 2013¶

In [32]:
highest_crops_yield=yield_df.groupby('Area')['ton/ha_yield'].sum()
highest_crops_yield=highest_crops_yield.sort_values(ascending=False)
ten_highest_crops_yield=highest_crops_yield.head(10)
In [33]:
fig=px.bar(
    ten_highest_crops_yield,
    x=ten_highest_crops_yield.index,
    y=ten_highest_crops_yield,
    title='Ten highest crop yielding Countries (1990 to 2013)',
    color=ten_highest_crops_yield.index
)

fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()